Bulk Loading :
Building a KD-Tree involves recursively partitioning the space by alternating axes at each level. At each recursive step, we choose the median point along the current axis to ensure a balanced tree. Finding the median takes O(n) and we perform this for log n levels (since the tree height is log n in a balanced case), giving a total complexity of O(n log n). This balance helps ensure efficient querying later on.
Insertion:
To insert a new point, we start at the root and recursively compare the point’s coordinate to the current node’s coordinate on the corresponding axis. We follow a single path down the tree, choosing left or right based on the comparison. In a well-balanced tree, this path has a depth of log n, resulting in O(log n) time. However, if the tree becomes unbalanced (e.g., due to inserting sorted or clustered data), the depth can grow to n, leading to a worst-case complexity of O(n).
Deletion :
Deletion is more complex than insertion because we may need to restructure part of the tree. After finding the node to delete (which takes O(log n) on average), we have to replace it with a suitable candidate (e.g., the minimum node in the subtree along the current axis). This may involve scanning a subtree and rebalancing parts of the tree. So while average performance remains O(log n), worst-case deletion can degenerate to O(n) if the tree is skewed or contains degenerate structures.
Range Query :
A range query checks which points fall inside a given region (e.g., rectangle in 2D). Instead of scanning all points, we prune entire subtrees if their bounding regions lie outside the query. On average, this leads to visiting about O(n
1−1/k) nodes, where k is the number of dimensions. We also have to return m points that actually lie inside the query, so total complexity becomes O(n
1−1/k + m). In the worst case—such as when the query covers the whole space—we may visit all nodes, giving O(n).
k-NN Query :
The k-nearest neighbors query searches for the closest k points to a given location. KD-Trees use a backtracking search with a priority queue to explore only the most promising branches first, pruning away regions that cannot contain closer points. On average, this gives O(k log n) time since we often don’t need to examine the entire tree. However, in the worst case (such as high dimensions or poor balance), the algorithm may need to explore nearly all nodes, leading to O(kn) time.